Variance-Reduced Proximal Stochastic Gradient Descent for Non-convex Composite optimization

نویسندگان

Xiyu Yu

Dacheng Tao

چکیده

Here we study non-convex composite optimization: first, a finite-sum of smooth but non-convex functions, and second, a general function that admits a simple proximal mapping. Most research on stochastic methods for composite optimization assumes convexity or strong convexity of each function. In this paper, we extend this problem into the non-convex setting using variance reduction techniques, such as prox-SVRG and prox-SAGA. We prove that, with a constant step size, both prox-SVRG and prox-SAGA are suitable for non-convex composite optimization, and help the problem converge to a stationary point within O(1/ ) iterations. That is similar to the convergence rate seen with the state-of-the-art RSAG method and faster than stochastic gradient descent. Our analysis is also extended into the min-batch setting, which linearly accelerates the convergence. To the best of our knowledge, this is the first analysis of convergence rate of variance-reduced proximal stochastic gradient for non-convex composite optimization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decoupled Asynchronous Proximal Stochastic Gradient Descent with Variance Reduction

In the era of big data, optimizing large scale machine learning problems becomes a challenging task and draws significant attention. Asynchronous optimization algorithms come out as a promising solution. Recently, decoupled asynchronous proximal stochastic gradient descent (DAP-SGD) is proposed to minimize a composite function. It is claimed to be able to offload the computation bottleneck from...

متن کامل

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on nonconvex optimization. Recent studies have shown that the asynchronous stochastic gradient descent (SGD) based algorithms with variance reduction converge with a linear convergent rate on convex problems. However, there is no work to analyze asy...

متن کامل

Asynchronous Stochastic Proximal Methods for Nonconvex Nonsmooth Optimization

We study stochastic algorithms for solving non-convex optimization problems with a convex yet possibly non-smooth regularizer, which nd wide applications in many practical machine learning applications. However, compared to asynchronous parallel stochastic gradient descent (AsynSGD), an algorithm targeting smooth optimization, the understanding of the behavior of stochastic algorithms for the n...

متن کامل

Asynchronous Doubly Stochastic Proximal Optimization with Variance Reduction

In the big data era, both of the sample size and dimension could be huge at the same time. Asynchronous parallel technology was recently proposed to handle the big data. Specifically, asynchronous stochastic (variance reduction) gradient descent algorithms were recently proposed to scale the sample size, and asynchronous stochastic coordinate descent algorithms were proposed to scale the dimens...

متن کامل

Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic Optimization with Progressive Variance Reduction

In this paper, we propose a simple variant of the original stochastic variance reduction gradient (SVRG) [1], where hereafter we refer to as the variance reduced stochastic gradient descent (VR-SGD). Different from the choices of the snapshot point and starting point in SVRG and its proximal variant, Prox-SVRG [2], the two vectors of each epoch in VRSGD are set to the average and last iterate o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1606.00602 شماره

صفحات -

تاریخ انتشار 2016

Variance-Reduced Proximal Stochastic Gradient Descent for Non-convex Composite optimization

نویسندگان

چکیده

منابع مشابه

Decoupled Asynchronous Proximal Stochastic Gradient Descent with Variance Reduction

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

Asynchronous Stochastic Proximal Methods for Nonconvex Nonsmooth Optimization

Asynchronous Doubly Stochastic Proximal Optimization with Variance Reduction

Larger is Better: The Effect of Learning Rates Enjoyed by Stochastic Optimization with Progressive Variance Reduction

عنوان ژورنال:

اشتراک گذاری